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To illustrate a system for analyzing research 
reports, four published evaluations of the Biological Sciences 
Curriculum Study (BSCS) program were analyzed in terms of problem 
raised, previous work cn.ted, objectives stated, hypotheses 
formulated, assumptions made, population studied, sample drawn, 
instruments used, design examined, procedure followed, safeguards 
taken, observe 'ions recorded, findings assembled, statistics 
interpreted, interpretations discussed, conclusions reached, 
limitations recognized, further work projected, improvements 
suggested and clarity of report. The analysis is reported as a chart 
with each aspect of each, report graded from A to E according to the 
author's judgment of the strength of the study in that area. The 
author concludes that the case for BSCS has yet to be proved. (EB) 
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AN ANALYSIS OP PUBLISHED EVALUATIONS OF BSCS 



Studyinj? reports of research projects can be quite 
frustrating. What was the investigator really doing? 

How well did he standardize his instruments, sample his 
subjects, assemble his data? Where did he introduce safe- 
guards to protect the integrity of his study? Why are 
his conclusions different from those of other workers? 

Was the work itself sound, but the reporting inadequate? 

Even more confusing is the task of summarizing a 
number of studies in the same area. Locating the reports 
is a problem — though ERIC (1) is now a help — but more 
difficult is reducing the published papers to some common 
bases so that their findings can be compai»ed. The usual 
compromise is to accept the conclusions of the authors as 
stated and let it go at that; but what if the conclusions 
are unwarranted — 36 per cent in one study (8) — or if 
the projects are so different as to make comparisons almost 
meaningless? These questions face all consumers and 
reviewers of research and, because they are so complex, 
leave those who would like to utilize or explain research 
findings more or less at sea in the middle of research. 

To help bring some order into the processes of 
evaluating and reviewing published studies, I have developed 
a fairly simple set of Guidelines, described elsewhere (7), 
for analyzing research reports. (A reprint of this paper 
is attached. ) These Guidelines, which are easy to apply 
and which focus emphasis on the chief qualities of good 
research, are comprised of the following twenty criteria: 



I , Problem raised, 2 , Previous work cited^, 3 . Objectives 
stated, 4. Hypotheses formulated, 5, Assumptions made, 

6 , Population studied, 7. Sample drawn, 8 „ Instruments 
used,^ 9, Design examined, 10 , Procedures followed, 

II . Safeguards taken, 12 , Observations recorded, 

13, Findings assembled, II 4 ., vStatistics interpreted, 

15 , Interpretations discussed, 16, Conclusions reached, 
17 . Limitations recognized, I 8 , Further work projected, 
19. Improvements suggested, 20, Clarity of report. 



Ihe above criteria -- which are also useful for 
analyzing individual reports, for planning projects and 
for writinr? proposals -** can be used to compare and review 
a number of reported studies within a given field. Such 
an analysis, in the form, of a chart, of the known assess- 
ments of the three BSCS versions in high school biology, 
constitutes the body of this paper, A literature search 
has revealed only four published summative evaluations, 
although there are some unpublished dissertations (2,3,4) 
and some published reports which use BSCS materials in 
dealing with other research topics (examples; 5, 6), 

Each criterion above is used in the chart to concisely 
and critically describe some aspect of each study, with 
enough information to yieTd a fairly adequate resume of 
the entire published evaluation. 

Each aspect of each report is also graded to 
indicate how well, in my own subjective opinion, each 
criterion has been met, according to the following scale; 
A, for a complete and clear statement fully satisfying 
the criterion; B, for a fairly good statement, but lack- 
ing some essential quality; G, for a weak statement, or 
a strong implication somewhere in the report; D, for 
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a quite inadequate or confusing statement, and E, for the 
lack of either a statement or an implication. It would 
be wrong to transform the letter grades to mimerical 
values in order to compute a total score or a mean score, 
since all the twenty criteria do not have equal weight. 

To be completely fair to the authors of the papers 
analyzed, the chart has been submitted to them for their 
comments and has been partly revised in the light of their 
criticisms. It is also fair to point out that published 
papers are sometimes altered or shortened by editors. 

Although the chief purpose of this chart is to 
illustrate how the (Guidelines may be applied, it is also 
possible to draw from it some conclusions about the 
subject matter, that is, the results of the evaluations 
of the 3S0S curricula. My own general opinion, after this 
review, is that the case for BSCS has yet to be proved, 
Tliat is a pity. After hundreds of competent and enthusi- 
astic people have spent a vast number of man-hours in 
developing what is obviously a fresh, bright, complete 
and up-to-date series of curricula for secor.dary school 
biology, it seems a shame that more conclusive published 
evidence is not yet available as to its validity — that 
is, sound proof that the BSCS program truly accomplishes 
what it sets out to do. (Good , evaluation strategy (9) 
would seem to require proper sampling (10) if the find- 
ings are to be honestly generalized, carefully standard- 
ized assessments of growth and achievement, control 
groups and other design safeguards. After all, in 
evaluating the effects of science teaching, the emphasis 
should be on scientific rigor. 
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PUBLISHED REPORT 

Grobman, H, , Wallace, 
and Klinkmann, E, 

’’The BSCS 1961-62 Evaluation 
Prop^ram, ” (in four papers) 

BSCS Nev/sletcer . No, 19, 
September 1963, 

Lisonbee, L, and 
Fullerton, B.J, 

’’The Comparative Effects of 
BSCS and Traditional Biology 
on Student Achievement,” 

School Science and Mathematics , 
64:594-598; October 1964, 

George, K.D, 

’’The Effects of BSCS and 
Conventional Biology on 
Critical Thinking,” 

Journal of Research in Science 
Teaching . 3:293-^99; 4> 1965, 

Staff of Psychological Corp, 

”A Report of the BSCS 
End-of-Year Evaluation 
Program, 1964-1965. ’* 

BSCS Newsletter . No, 30, 

January 196?, 



1, PROBLEM RAISED 

Implied: How effective 

was the BSCS program in 
the three versions, and 
also when compared to 
non-BSCS biology, in its 
trial use during 1961-62? 



Would high, middle and low 
ability students from 
different schools enrolled 
in the BSCS program do as 
well on achievement tests 
as those in traditional 
biology classes? 

Would the BSCS program be 
more successful than 
conventional biology in ^ 
developing critical 
thinking ability? 



Implied: How would 

students perform on the 
different forms of the 
different BSCS achievement 
tests, and how would this 
be related to academic 
ability and to reading 
skills? 



o 
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2. PREVIOUS WORK CITED 



3. OBJECTIVE STATED 



^ Only two encyclopedic books. 



Implied: To justify 

the worth of the BSCS ^ 
program. 



Several references about 
the development of this 
(then) new program. 



To measure the effect 
of BSCS vs traditional ^ 
biology on student 
achievement. 



i 

I 

[ 




Pour old studies in 
non-science areas, but none 
on the many papers on 
scientific thinking, nor 
on recent reports. 



^ No references. 




To test the effect of 
BSCS vs conventional 
biology on development 
of critical thinking. 



Implied: To develop 

good standardized tests 
for the BSCS program. 







k. HYPOTHESES FORMULATED 



5 . 



ASSUMPTIONS MADE 



Implied in the tabulation of findings: 

There would be no significant differences on 
three achievement post-tests and three attitude 
post-measures between: male, female lOth graders 
taught with or without lab blocks; with Blue, 
Green or Yellow versions -- compared with male, 
female control groups using non-BSGS materials, '»■ 
and with 9th graders using the three versions. 



None 

stated. 



Clear statement cf eight hypotheses: 

There would be no significant differences in 
achi G V* oxiiCiAis p v»x *fcii GTT'^M and I TED scox'es held 
constant, on the Nelson Biology Test or the 
BSCS Comprehensive Final, between BSCS vs 
traditional classes of high, middle and low 
ability, nor among students in different schools 



None 

stated. 



Clear statement of four hypotheses: 

There would be no significant differences in 
critical thinking ability between pupils taught 
by Blue, Green or Yellow versions vs pupils 
taught by conventional biology;^ nor among 
pupils taught by the three versions. 



Listed 8 
assumptions, 
and defended 
7 of them 
by citations 
and evidence. 



Implied in the tabulation of findings: None 

There would be no significant differences in stated, 

scores of 10th graders on two parallel forms of 
each of the three versions* Quarterly Achieve- 
ment Tests, nor on two parallel forms of BSCS 
Comprehensive Pinal, when compared by: sex, 
academic ability, BSCS version or reading ability. 

No other information on this item 



6. POPTJLATIOH STUDIED 



8, SAMPLE DRAWN 







Neap 3? centers in the U.S., 36I teachers 
tau(2;ht 39,000 students, stratified (but 
not matched) by: grade, sex, lab block or 
non-block use, BSCS version. Control 
6 groups, matched by teacher and school, of 
3914}- students taught without BSCS mate~ 
rials by 136 teachers. Academic ability 
of students above average. 

3500 10th grade biology students who 
were tested for homogeneity by 
Bartlett’s Test.* 



None ; only the 
16 sub-popula* 
tions to which ^ 
the findings 
can be said 
to apply. 



By random sampling 
chose expt’l and 
control grr'-^ps, 
stratified into 
high, middle and 
low ability and 
by schools. 



In four suburban Chicago schools, 

19 classes ranging from 1? to 2$ were 
taught by 10 volunteer teachers. 

Classes, teachers and facilities were 
found equivalent by a questionnaire.* 
Academic ability of students above average. 



”No claim is made 
as to the rep- 
resentativess of 
the biology 
classes. ” 




9,814-6 10th graders took achievement tests. Sampling ^ 

996 took Davis Test/^ 90? took Illinois Test.* hinted at.* ^ 
"Among versions, participating schools were 
^ similar as to type of community, and as to 
^7P©, size and facilities of schools,” 

Teachers had similar education, experience 
and work loads,* ’’Statistical information 
may be found in the BSCS Manual” but not 
in this report. 






No other Information on this item. 
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6. INSTRUMENTS USED 



9. DESIGN EXAMINED 



SCAT (B) for academic ability. BSCS 
Comprehensive Pinal and Coop Biology 
Test (y) for achievement. Impact Teat 
g for reasoning ability and understand- 
ing. Three attitude measures from 
TOUS, Purdue attitude scale and seman- 
tic differential. Questionnaire on 
teacher background and schools. 



Independent variable; 
SCAT scores, groups 
equated by covariance. 
Dependent variables; j 
Post-tests on achieve- 
ment and attitudes 
(but without pre-tests, 
no measure of gains). 



California Test of Mental Maturity 
(CTMM) for academic ability. Iowa 
3 rest of Educational Development, (6 ) 
(ITED) for scholastic ability. 
Nelson Biology Test and BSCS Comp- 
rehensive Pinal for achievement. 



Independent variables: 
CTMM and ITED scores. 

Dependent variables; 
Post-tests on achieve- 
ment (but without pre- 
tests, no measure 
of gains ) . 



Otis Quick-Scoring Test (Gamma Fm) Independent variables: 

B for academic ability. Watson-Slaser IQ scores, equated 

Critical Thinking Appraisal (Rev. Zm), by covariance, and 

pre-test scores on W-G. 
Dependent variables 
Post-test scores on W-G. 



DAT (L) for academic ability. BSCS 
Quarterly Achievement Testa (R and S) 
for each BSCS version. BSCS Comp- 
g rehensive Pinal (I and II), Davis 
Reading Test (2A and 2D) and the 
Illinois Natural Science Reading 
Comprehension Test, for reading skills. 

(Note: None of the above instruments 
were defended in any report. ) 



Independent variables; 
DAT scores and pre- 
tests on reading. 

Dependent variables: C- 

Two post-tests on 
reading and six post- 
tests (but no pre- 
tests )on achievement. 
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10, PROCEDURES FOLLOWED 

All students took SCAT at start of trial year. In 10th grade 
29 classes used Blue, 39 Oreen, I4.9 Yellow versions with lab 
blocks; while ^7 used Blue, 52 Oreen, 39 Yellow without blocks. 
In 9th grade 5 classes used Yellow with blocks; while 11 used 
Blue, 12 G-reen, 6 Yellow without blocks, 125 classes did not 
use BSCS materials. All students took 3 achievement and 3 
attitude tests at end of year, BSCS students also took Quar- 
terly Achievement Tests, All teachers filled described ques- 
tionnaire, (Explained here and there; also implied in tables.) 



No description, 

(Pull details in dissertation, but unpublished. ) 



All students took Watson-Glaser as pre-test in September 
and Otis in mid-year, BSCS taught in 13 classes by 
6 teachers -- Blue by one. Yellow by two and Green by three. 
Pour teachers taught conventional biology^^ in 6 classes. 

All students took Watson-Glaser as post-test in May, 



All students took DAT at start of trial year. Some groups^^ 
took Davis (2D) and others^took Illinois as pre-tests. 

Blue version studied by 38I4.7, Green by 2500 and Yellow by 
£> 3499 students; who took different Quarterly Tests (R or S) 

and also BSCS Comprehensive Pinal (both I and II) at end 
of trial year. Some groups took Davis (2A) and others the 
Illinois as post-tests. 

No other information on this item. 
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11. SAFEGUARDS TAKEN 12. OBSERVATIONS RECORDED 



Evaluation Committee, aided by three 
sub-committees and Educational Testing 
Service, formulated evaluation program. 
Wide geographic, socio-economic and 
cultural population coverage. 

Continuous feedback in written 
comments, consultant visits, etc, 

(many examples cited). 



No description of 
data collection 
methods, 

No examples 
of raw scores. 



2. None indicated. 



Same as above. 



t 



Pre-test and IQ scores held constant. Same as above, ^ 

Pooling of data defended. 

Both t-test and P-values used, 

"The statistical techniques employed 
warrant confidence in the results 
obtained.” (but see Welch, 9) 



^ None indicated. 



S ame as above. ^ 
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13 . FINDINGS ASSEMBLED 



1 I|.. STATISTICS INTERPRETED 



Data shown in 9 tables: Mean scores 
on 3 post-tests of achievement by: 
sex, treatment; sex and grade for non- 
block groups. Mean differences between 
various combinations. Correlations bet- 
ween ability and achievement scores. 

No data from attitude measures. Some 
verbal generalizations from question- 
naires. 



Mean scores, but no 
Standard Deviations, 
Correlations computed 
and levels of signifi- 
cance adduced (r's of 
-.09, .07, .17 and . 2 i\. 
called significant). 



c 



No tables of data, no numerical 
values reported, 

(Many tables in unpublished disser- 
tation, ) 



P- values from four 
analyses of covariance 
used to accept or reject 
null hypotheses, at Cl 
,05 level. Also some 
t-tests. 



Data shown in 6 tables: Nxombers of 

pupils, teachers and classes in 
groups. Summary of pooled data. 
Analysis of variance and covariance, 
adjusted means. 



Adjusted means, t-tests, 
analysis of variance, ^ 
Covariance to make 
pre-tests and IQ scores 
equivalent. Data pooled. 



Data shown in I4. tables: Raw means. Product-moment 

adjusted means and SD's for DAT, correlations. 

Quarterly and Comprehensive tests by: Some t-tests, 

sex, BSCS version, test forms and 
by t-tests. Correlations between: 

Davis vs DAT, Davis vs Comprehensive, 

Illinois vs DAT, Illinois vs 
Comprehensive, 



J> 
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15. INTERPRETATIONS DISCUSSED 

Correlations of SCAT scores vs end-of-year tests were hip^h and 
"make it quite clear that performance on these three tests is 
highly dependent on academic ability,” Differences among groups 
using three BSCS versions "were for the most part negligible 
and Inconsistent from one test to the next," Males did better 
than females, yet "it is inappropriate to conclude that the boys 
were superior," "Superior 9th grade students do as well as un- 
selected 10th graders,” "The findings of the BSCS Evaluation 
Program — through feedback and testing — indicate" 9 listed 
generalizations (but some seem hardly warranted by data shown), 

"The results of the study may indicate that students through 
the BSCS program learn the important core of information of 
the traditional plus the new, updated biology knowledge incor- 
porated in the BSCS course." (Unwarranted by the data shown, 
and not even tested. ) 



Instruction in BSCS was neither inferior nor superior to con- 
ventional biology in improving critical thinking ability, 
within the terms and limitations of this study, (But no 
evidence that gains were due to biology teaching and not to 
increasing maturity, for no controls were used. Also see 9). 



"None of the differences between pairs of test forms (R and S, 

I and II) is of any practical significance,” "These differences 
(between three BSCS versions) demonstrate merely that one group 
does slightly better than another on these particular tests,” 

”No consistent trends appear indicating that academic ability 
is a better predictor of one test than another," "Little, if 
any, gain would result from using both the DAT and a reading 
test in predicting BSCS final achievement,” 



- 13 - 



16. CONCLUSIONS REACHED 

’’All BSCS groups substantially outperformed the control group 
on the BSCS Comprehensive Final, ” The control group ’’greatly 
excelled all BSCS groups on the Cooperative Biology Test” and 
’’was slightly superioi’ to all BSCS groups on the BSCS Impact 
B Test,” ’’While 11 of the 12 block-nonblock comparisons were 
significant, all differences were small,” On two attitude 
measures, differences between BSCS and control groups were 
negligible. There was little or no relationship between Comp- 
rehensive Pinal results and traits of teachers or school, 

’’The experimental and control groups did not differ signifi- 
cantly on Nelson Biology Test,” ’’The middle and high experimental 
groups excelled significantly over the middle and high control 
^groups on BSCS Achievement test.” ”No significant differences 
appeared in achievement between middle and high ability levels 
on either of the two tests” nor ’’between the low ability groups 
on the BSCS test.” ”No significant differences appeared among 
schools on the Nelson but did so on the BSCS test,” 

Hypotheses 2 and 3 were accepted: Tliose in G-reen and Yellow 
versions were not significantly better than those in conventional 
^ biology. Hypotheses 1 and 1+ were rejected: Pupils in the Blue 
version were significantly better than those in conventional 
biology and those in the Yellow version, who were in turn 
better than those in the Green version, 

”BSCS students on the DAT were between 6^th and 7^th percentiles” 
of a national sample, ’’DAT was substantially related” to 
Quarterly and Comprehensive Pinal test performance, ’’Students 
£> in Blue version had the highest means on every ability, achieve- 
ment and final test, while those in Green version had the 
lowest,” ”In most groups the males outperformed the females 
on all tests” but only by a few points, DAT was highly cor- 
related with Davis and Illinois reading tests. 
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17. LIMITATIONS REGOONIZED 



18. FURTHER WORK PROJECTED 



Volunteer teachers and schools. 
Students above average ability, 
D Some BSCS objectives were not 
susceptible to quantitative 
measurement. 



Mentioned need for 
replication of JD 

evaluation studies. 



Tests did not really measure low 
ability students. Teachers and 
^ and teaching climate may have 
influenced results. Population 
limited to a single community. 



Impossible to hold constant the 
competence, methods, experience, 
philosophy and preparation of 
ft teachers. With only one teacher 
for Blue version, high scores may 
have reflected his good teaching. 
Conclusions depended upon validity 
and reliability of instruments. 



Check adequacy of tests 
for slow learners, and 
influence of teachers 
and teaching climate. 
Follow-up retention 
study of same pupils. 
Replicate with a more 
\iniversal population. 



None proposed. 




^ None acknowledged. 



None proposed, 
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19. IMPROVEMENTS SUGGESTED 

Clearly stated problem, objective 
and hypotheses. Fully described 
population and proper sampling. 
Instruments defended. Pre-tests, 
Assumptions and limitations 
recogni 2 sed. Only warranted 
conclusions drawn. Better 
organization of report. 

Larger population, better 
described. Procedure in full. 
Pre-tests with defended instru- 
ments, Tables with data and 
statistical treatment. Assump- 
tions, safeguards, limitations 
recognized. Conclusions defended. 

Larger population and proper 
sampling. More carefully chosen 
teachers. Standardized teaching 
program. Replication within 
project. 



Clearly stated problem, objective 
and hypotheses. Proper sampling 
of fully described population. 
Pre-tests on achievement. Control 
groups. Assumptions, safeguards, 
limitations recognized. Better 
statistical treatment of data. 
Conclusions fully defended. 



20. CLARITY OP REPORT 

Writing clear and straight- 
forward, with little jargon. 
Organization of report very 
poor, with chief elements ^ 
scattered and buried, so 
much data deserved better 
planning, more rigor and 
more adequate reporting. 

Written honestly; well 
organized and clear, B 

But description of procedure 
and presentation of results 
much too scanty, leaving 
conclusions suspect. 

Relatively little jargon, g 
but considerable repetition. 
Clear descriptions and 
good organization. 



Writing good, with little 
jargon. But a report for 
general readers should have 
given more information 
(even if in BSCS Manual) 
on students, procedure, 
and instruments. 
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